Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

[Maulik] batch commits from offset commit worker #45

Merged
merged 1 commit into from
Apr 24, 2020

Conversation

mauliksoneji
Copy link
Contributor

Address Issue #44 by introducing batch commits in OffsetCommitWorker

@mauliksoneji mauliksoneji force-pushed the commit_batching branch 2 times, most recently from fb2a52f to 5c765c0 Compare April 17, 2020 09:02
int offsetClubbedBatches = 0;
while (true) {
Records commitOffset = commitQueue.poll(queueConfig.getTimeout(), queueConfig.getTimeoutUnit());
if (stopped || clock.currentEpochMillis() - start > offsetState.getOffsetCommitTime()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like getOffsetCommitTime is more of a timeout then a commit time, if I am correct, can we name this accordingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OFFSET_COMMIT_TIME is the time till which we will accumulate the batches and then commit. It is not a timeout, it's a time after which we will commit the offsets to kafka.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, got it. Then I think we should name it something like getOffsetBatch(Time/Duration)? This is the time the batching process will take until the worker flushes it to Kafka.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset commit time looked more appropriate, that is the time after which we commit offsets to kafka.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the configuration is changed to offsetBatchDuration

if (partitionsCommitOffset.size() != 0) {
kafkaCommitter.commitSync(partitionsCommitOffset);
}
log.info("committed offsets partition {} size {}", partitionsCommitOffset.toString(), partitionsCommitOffset.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this log really helpful? Can we mark this either debug or remove serializing the whole map to a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we will be committing after batching commits, we want to log when the offsets were committed, there is no other info log in the OffsetCommitter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mauliksoneji Yeah, that's fine, I am just wondering what's the use of seeing offset values per partition in the log, we can just log the size of that map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't have much information, the maximum number of keys would be the topic-partition subscribed by the pod, and there is only max offset in the value, this is not that much information.

src/main/java/com/gojek/beast/config/AppConfig.java Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Apr 24, 2020

Codecov Report

Merging #45 into master will increase coverage by 0.82%.
The diff coverage is 91.83%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #45      +/-   ##
============================================
+ Coverage     81.73%   82.56%   +0.82%     
+ Complexity      257      255       -2     
============================================
  Files            52       52              
  Lines           805      820      +15     
  Branches         73       71       -2     
============================================
+ Hits            658      677      +19     
+ Misses          126      120       -6     
- Partials         21       23       +2     
Impacted Files Coverage Δ Complexity Δ
.../gojek/beast/protomapping/ProtoUpdateListener.java 57.62% <50.00%> (+4.59%) 5.00 <0.00> (ø)
...ava/com/gojek/beast/worker/OffsetCommitWorker.java 94.91% <92.50%> (-5.09%) 11.00 <2.00> (+1.00) ⬇️
...ain/java/com/gojek/beast/commiter/OffsetState.java 100.00% <100.00%> (+6.25%) 10.00 <4.00> (-3.00) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 304883e...f24e943. Read the comment docs.

@kushsharma
Copy link
Contributor

LGTM

@mauliksoneji mauliksoneji merged commit 40f9353 into master Apr 24, 2020
@mauliksoneji mauliksoneji deleted the commit_batching branch April 24, 2020 09:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants